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' We study the distribution of the maximum of a set of random fitnesses with fixed number of 

' mutations in a model of biological evolution. The fitness variables are not independent and the 

correlations can be varied via a parameter £ = 1,...,L. We present analytical calculations for the 
following three solvable cases: (i) one-step mutants with arbitrary £ (ii) weakly correlated fitnesses 
^ ' with £ — L/2 (iii) strongly correlated fitnesses with £ = 2. In all these cases, we find that the limit 

^ distribution for the maximum fitness is not of the standard Gumbel form. 

(N : 

O^I ■ Introduction: Extreme value theory [iL I4I has found applications in various diverse fields ranging from physics of 
' disordered systems such as spin glasses |3( and driven diffusive systems [4] to hydrology [sl and finance @. Here we 
pH are interested in its role in a model that describes the biological evolution of an infinitely large population of asexually 
replicating genetic sequences. The (logarithmic) population of a sequence increases linearly with time with the slope 
given by the sequence fitness and the intercept by the number D = 0, L of mutations with respect to the reference 
2 ' sequence 0- It has been shown that out of the Sd = (^) sequences present at constant D, the population dynamics 
^ ' , involve only the sequence with the largest fitness at given D [8| . 

If the sequence fitnesses are uncorrelated random variables chosen from a distribution decaying faster than a power 
I law, the largest fitness is distributed according to the well known Gumbel distribution However as several 
I t . experimental and theoretical studies have indicated that the realistic fitness landscapes are not completely random 
, we are led to study the extreme statistics of correlated fitnesses. In recent studies of extreme statistics of strongly 
^ , correlated variables, deviations from the Gumbel distribution have been shown numerically (see, for example, p^ ) or 
I ' by analysing the tails of the extremal distribution [ill , fl^ but very few analytical results for the full distribution have 
^ \ been obtained [Tsl . [T^ . In this Letter, we obtain analytical results for the full distribution for both weak and strong 
Q ■ correlations and show that it has a non-Gumbel form. 

^ [ Block model: We consider a block model 15[ of protein evolution in which a protein sequence of length L is 
'— represented by a binary string of O's and I's and divided into B blocks of equal length £ = L/B. The block fitness 
^ ] fjid) gives the fitness of a block with d ones and the jth permutation of such Sd = (^) possible random variables, 
^ I each of which are chosen independently from a common exponential distribution. The sequence fitness is given by the 
(yT) ' average of the corresponding block fitnesses and two sequence fitnesses are correlated when they share at least one 
CO block fitness. An attractive feature of the block model is that the correlations amongst the fitnesses and the structure 
0^ of the fitness landscape can be tuned with the block length £. For £ = 1, the sequence fitnesses are strongly correlated 
] and the fitness landscape is smooth, while for £ = L, the model has uncorrelated fitnesses and the fitness landscape 
^\ •■ is maximally rugged. In the following, we work with even L and consider D < L/2 as the results for D > L/2 can 
be obtained on simply replacing D hy L — D. The number D of mutations is measured with respect to the reference 
sequence {0, 0, ...0}. 

One-step mutants, any £: We first consider the extremal distribution for the fitnesses which carry only one mutation 
as this case can be solved for any £. Although there are L one-step mutants, the number of sequences with distinct 
fitness is £ as the fitness Wj of one-mutant neighbor is given by 

^^^_ {B-i)fm+m , 



^ ,. = 1,-,^ (1) 



Since the cumulative distribution Ve{w, D, L) gives the probability that all the Wj's are smaller than w, we have 

e Bui 

Vi{w,l,L)^ d/i(0)e-^i(") Jl / dfj{l)e-f'^^^Q{w-Wj)= dfe'f l-e-^^e'^^'^^^ (2) 
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where &{■■) is the Heaviside theta function. The cumulative distribution calculated using the above equation is shown 
in Fig. [U for various For £ = 1, the distribution "Pi (w) = l + {L-2)-'^ [e"-^™ - (L - i)e-^'"/(^-i)] while the double 

exponential form e~^'^ is obtained for £ — L. Thus we have an example of a family of extremal distributions that 
interpolates between exponential and Gumbel distributions as correlations are varied. 

The integral on the right hand side (RHS) of ^ does not seem to be exactly doable but for fixed B, it is possible 
to cast it in a scaling form which turns out to be of non-Gumbel form. Following an integration by parts, ([2]) can be 
rewritten as 

Jo 

n=0 ^ ^ 

It is evident from the last expression that in the limit i,w ^ oo with £e~^'^ finite, 'Pi{w,B) deviates from Gumbel 
distribution for B > 1. Since the summand in ^ peaks around n ^ e^™ ^ 1 as w — *■ c», the binomial coefficient 

can be approximated by n-B-i /r( ) for large n. Replacing the sum in ^ by an integral and defining the scaling 
variable u = w — B^^ In I , we finally have 

where the last expression holds for all u except u ^ oo and r{a,x) is the incomplete gamma function [l6\. Thus the 
limit distribution Veiw, B) is a function of u (see inset) and is of traveling wave form Fb{w — vt) if we identify t by 
In^ and velocity v hy B~^ (also see (fT3| below). Note that unlike previous works [H, [3] that assume the distribution 
to be of traveling wave form, here we have shown the existence of such a solution. 

Block length £ = L/2, any D: As B is an integer, L/2 is the largest value of £ at which correlations are nonzero. 
We now turn to this case with weak correlations and show that the distribution is of non-Gumbel form for any D. 
For B = 2, the fitness of a sequence with D ones can be obtained by averaging over the block fitnesses with d' ones 
in the first block and d" = D — d' ones in the second block. As there are Sd' possible fitnesses for the first block and 
Sd" for the second, the sequence fitness takes the following form: 

/.n f]{d') + fk{d") 7/ ri J • T ; 1 IR\ 

Wj,k(d ) = — , a = 0, ...,du , J = 1, •■•,Sd' , fc = 1, Sd" (6) 

where du = {D — l)/2 for odd D and D/2 for even D. The above equation gives distinct Wj^k{d') for all d' except 
d' = D /2 for which as d' = d" , distinct fitnesses are obtained when the index k runs from j to S£)/2. Thus the number 



(3) 
(4) 
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of distinct random variables are given by (1/2) ^(^) + {o/2)i^ ~ ^ mod 2)^ [T^ which increases as ~ £^ . As we 
shall see below, the extreme value distribution depends on whether D is odd or even. 

(i) For odd D, the fitnesses Wj,k{d') are identically distributed as is evident from ([6]). The probabihty that all the 
fitnesses are smaller than w is given by 

pL/2(w;,i)= n nn / / dh{d")e-l-(''"^Q{w-w,.k{d')) (7) 

d'=oi=i fe=i "^^ •'^ 

In the above expression, the product 0^=1 '^{w—Wj^k{d')) in the integral over fk{d") requires that fk{d") < 2w — fj{d') 
for all j = 1, Sd'- It is however sufficient to satisfy fk{d") < 2w — fj{d') where f.j{d') = max{/i(d'), fs^, {d')}. 
Furthermore, as fk{d") is positive, 2w — fj{d') must also be positive for all j thus restricting the domain of integration 
over fj{d') to 2w. Thus we can write 

r°° ^rf' fd' ^d' 

/ dfk{d")e'f^^''"^ n - f,{d') - fkid")) = ^ ^(^z^ - fj{d')) J] - f,{d')){l e-^^+f-'^"'^) (8) 

•^0 J=l J=l 

which is independent of k. As a result, the product over fc in ([7|) can be done using the basic properties of Heaviside 
theta function. This immediately gives 



rL/2{w,L) = n E / dfj{d')e-f^'^''\l-e-'-+f-'^'''^r^" H / df,{d' 

d'=0 .1=1-^0 . = 1-^0 



(9) 



n ^d' / d/ e-/(l - e'2-+/).." (1 _ e-/)-d'-i (iq) 



d'=0 



For D = 1, the above expression reduces to ([2|) with £ = i/2. Following the steps similar to those leading to (jl]), we 
rewrite the last equation as 

Vl/2{w,L) = \ \sd'Sd"il~e ■'^yd'+'^d" \_, ^ — 11 

f;\ ^„ [n + Sd'){n + Sd") nl{n + Sd' + Sd'^i 

d'=0 n=0 ^ ' ^ ' ^ ' 

To find the limit distribution, we first note that the factor corresponding to d' = in (fTO|) is of the form ([2]). On 
comparing, we infer the scaling variable for d' = term to be S£)e~^™ ~ £^e~^™ when ^, w — > co. This suggests that 
for arbitrary d', the product Sd'Sd"e~'^'^ remains finite while s^'e"^™, Sd"e~'^^ for large £ and w. In these scaling 
limits, for large n, we can write 



jn + Sd")\{n + Sd'Y- _ ^ __n±Sd_Y''' _ g-s^,s^„/n ^ 
n!(n + Srf/ + Sd")! \ " + -"^d' + 



exp [-e-2"<i'/^e-2™] (12) 



where Ud' = w — \n{^Sd'Sd"). Approximating the sum in (jlip by an integral, we finally get 



^L/2(^«,i) « n / ^ e-^^ = n 2e-«-'Xi(2e-"-') (13) 



d'=0 d'=0 



where Kn{x) is the modified Bessel function of the second kind ^16.] . Interestingly, the above distribution for D — 1 has 
the same form as the cumulative distribution for the minimum energy in a random energy model with logarithmically 
correlated potential [l3|- However, for D > 1, there does not appear to be a single scaling variable, 
(ii) For even D, since the fitnesses Wjj{D/2), j — 1, Sd/2 have a different distribution than the rest, the fitnesses are 
not identically distributed in this case. Using the results obtained above for d' < Z3/2 and separating the contribution 
due to d' — D/2, we can write 

du-l i.2w So/2 So/2 oo 

Vl/2{w.L)^ W Sd' / d/e-/(l-e-2-+/)«d"(i_e-/)«d'-i X [] [] / df,{D/2)e-f^^''/'^&{w - Wj,k{D/2)) 
d'=o j=l k=j 

(14) 
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By applying the same procedure as for odd D, the integral over fj{D/2) can be evaluated to give 1 — e ™. Within 
the same scaling limits as for odd D, we finally obtain 



Block length £ ^ 2, any D: For £ = 2, although the sequence fitnesses are not only strongly correlated but non- 
identically distributed as well, it is possible to solve for the extreme value distribution exactly. If ni and n2 denote 
the number of blocks with fitness /i(l) and /2(1) respectively, the number of blocks with fitness /i(2) at a fixed 
D is given by {D — ni — n2)/2. Furthermore, as the total number of blocks equals B, there are {L — D — ni — 712) /2 
number of blocks with fitness /i(0). Thus the fitness Wm^n^ of a sequence with D mutations obtained by averaging 
over the block fitnesses is writeable as 

[L-D-m- n2)fm + 2ni/i(l) + 2712/2(1) + {D ~ m - n2)fii2) 

Wni.n2 = ^ (16) 

The cumulative distribution 7^2 (w,L) is given by 

„00 poo "1,..,"2.„ 

V2{w,L)^ ... dA(0)rfA(l)d/2(l)d/i(2)e-^^(°)^/^(i)-/^(i)-/^(2) TT e(^_y;„^„J (17) 
Jo Jo „ „ 



ni,«2="l,i:n2,l 



where ni^ui^i^i) is the maximum(mininium) allowed value of ni,i — 1,2 which, as discussed below, depends on 
whether D is odd or even. Before proceeding further, we first note that in the product over theta functions in the 
above integrand, only those factors in which at least one of the indices ni,n2 are zero need to be retained and the 
rest are redundant. To see this, consider the theta functions with a given ni + 712. Then if /i(l) > /2(1), the fitness 
'^ni,n2 < w;„j4.„2_o so that the condition Q(w — 'u;„^,„2) automatically satisfied by Q(w — Wm+nj.o)- Similarly if 
/2(1) > /i(l), it is enough to keep Q{w - wo.ni+na)- 

(i) For even D, as 713 is an integer, both ni, n2 should be either odd or even which implies Ui^u = Ui^i = 0. Besides, 
the conditions ui + n2 < D,ni < D should be satisfied as 773 is nonnegative. Counting the number of possibilities, 
we find that the total number of distinct fitnesses increases as {{D + 2)/2)^ for D < L/2. Using the redundancy 
argument given above in ([TT]), we have 



V2iw,L) 



poo pOO pOO 

/ / d/i(0)4fi(2)e-/i(")-/i(2)Q(^_^,^^^) / dA(l)e-/i(i) TT e(u;-7«„,,o) 
JO Jo Jo 1 



ni — 1 



(18) 



It is easy to see that the integral over /i(l) is nonzero provided /i(l) < min{Q; + (3, {a/ D) + (3} where we have 
defined a ^ {Lw - {L - D)/i(0) - D/i(2))/2 and /3 = (/i(0) + fi{2))/2. For a > 0, this condition reduces to 
/i(l) < [a/D) + (i while for a < 0, we require /i(l) < a + /?. Thus we obtain 

/ d/i(l)e-^^(i) n - ^"i.o) = ( n + /3) (1 - e"*^'') + 0(-«)0 + /?) (1 - e"""^) (19) 

ni = l 

Using Q{w — wofl) — Q{a), we finally get 



00 /"OO 



r2iw,L)^ I I d/i(O)d/i(2)e-^i(")-^i(2)0(^)(^_g-S-/3)2 ^ / d/e"^(l-e' "'"^«'"" )^(l-e- (i20) 
Jo Jo Jo 

where y = D/L. The above integral can be easily computed and an explicit exact expression for the distribution 
P2{w,L) = dV2{w,L)/dw is given by 

— 2e 2e _ g y + g (i-y) 4ye y(^g y _ g j 

^^(^'^)^T^ + 1^2^ +l-67/ + 8y2+ l-^y + Qy^ ^^^^ 

The mean w and the variance calculated using P2{w) are then given by 



(22) 
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Thus the mean increases hnearly with y but the variance varies non-monotonically - it initially decreases with y and 
then increases with the minimum at y* — 3804/16270 « 0.233. 

(ii) \iD is odd, we require that either ni is odd and n2 is even or viceversa alongwith the condition ni+n2 < D,ni < D. 
In this case, rii^u = D,nij = 1 and we obtain (D + 1){D + 3)/4 distinct fitnesses for D < L/2. Following the same 
reasoning as above, the cumulative distribution for odd D can be written as 



V2{w,L) = / / d/i(0)d/i(2)e-^^(")-/i(2) 

^0 JO 

d/i(0)d/i(2)e-/i(")-/i(2) 



OO poo 



poo 

e(a)e + /S) (1 - e-^-^f + e(-a)e (a + /3) (1 - e-"-^)f|^4) 



The first term in the above sum reduces to ((20)) . In the second term, the condition Q {a + f3) requires that < 
(L>-l)/i(2) < Lw-{L-D-l)fi{0) and the condition 9 (-a) can be satisfied if (i) D/i(2) > Lw - (L- D)fi{0) > 
and (ii) D/i(2) > , Lw— (L — D) fi{0) < 0. Putting all these conditions together, the second term can be evaluated. 
However for L ^ 1, the contribution of the second term to (j24p can be neglected and we obtain the same result as 
for even D. 

Conclusions: We have presented several analytical results for the extreme distribution in a model with tunable 
correlations. When the fitnesses are strongly correlated and non-identically distributed, full distribution is obtained 
exactly. For D = I and arbitrary i, we have shown that the limit distribution is of traveling wave form. As the limit 
distribution in the large £ limit for D = 1 and i = L/2 for any D are seen to be of traveling wave form, we expect 
that this form survives for £ ~ 0{L) and any D. The weakly correlated model with Z? = 1 is found to obey the same 
extreme statistics as a random energy model with correlations. An elucidation of the connection between these two 
apparently unrelated models would be interesting. 
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